Automatic Recognition of Logical Relations for English, Chinese and 
Japanese in the GLARF Framework 


Adam Meyers , Michiko Kosaka*, Nianwen Xue , Heng Ji% Ang Surf, Shasha Liao and Wei Xu 

t New York University, * Monmouth University, °Brandeis University, * City University of New York 


Abstract 

We present GLARF, a framework for repre¬ 
senting three linguistic levels and systems for 
generating this representation. We focus on a 
logical level, like LFG’s F-structure, but com¬ 
patible with Penn Treebanks. While less fine¬ 
grained than typical semantic role labeling ap¬ 
proaches, our logical structure has several ad¬ 
vantages: (1) it includes all words in all sen¬ 
tences, regardless of part of speech or seman¬ 
tic domain; and (2) it is easier to produce ac¬ 
curately. Our systems achieve 90% for En¬ 
glish/Japanese News and 74.5% for Chinese 
News - these F-scores are nearly the same as 
those achieved for treebank-based parsing. 

1 Introduction 

For decades, computational linguists have paired a 
surface syntactic analysis with an analysis represent¬ 
ing something “deeper”. The work of Harris (1968), 
Chomsky (1957) and many others showed that one 
could use these deeper analyses to regularize differ¬ 
ences between ways of expressing the same idea. 
For statistical methods, these regularizations, in ef¬ 
fect, reduce the number of significant differences be¬ 
tween observable patterns in data and raise the fre¬ 
quency of each difference. Patterns are thus easier 
to learn from training data and easier to recognize in 
test data, thus somewhat compensating for the spare¬ 
ness of data. In addition, deeper analyses are often 
considered semantic in nature because conceptually, 
two expressions that share the same regularized form 
also share some aspects of meaning. The specific de¬ 
tails of this “deep” analysis have varied quite a bit, 
perhaps more than surface syntax. 


In the 1970s and 1980s, Lexical Function Gram¬ 
mar's (LFG) way of dividing C-structure (surface) 
and F-structure (deep) led to parsers such as (Hobbs 
and Grishman, 1976) which produced these two lev¬ 
els, typically in two stages. However, enthusiasm 
for these two-stage parsers was eclipsed by the ad¬ 
vent of one stage parsers with much higher accu¬ 
racy (about 90% vs about 60%), the now-popular 
treebank-based parsers including (Charniak, 2001; 
Collins, 1999) and many others. Currently, many 
different “deeper” levels arc being manually anno¬ 
tated and automatically transduced, typically using 
surface parsing and other processors as input. One 
of the most popular, semantic role labels (annota¬ 
tion and transducers based on the annotation) char¬ 
acterize relations anchored by select predicate types 
like verbs (Palmer et ah, 2005), nouns (Meyers et 
ah, 2004a), discourse connectives (Miltsakaki et ah, 
2004) or those predicates that arc paid of particular 
semantic frames (Baker et ah, 1998). The CONLL 
tasks for 2008 and 2009 (Surdeanu et ah, 2008; 
Hajic et ah, 2009) has focused on unifying many of 
these individual efforts to produce a logical structure 
for multiple parts of speech and multiple languages. 

Like the CONLL shared task, we link surface lev¬ 
els to logical levels for multiple languages. How¬ 
ever, there arc several differences: (1) The logical 
structures produced automatically by our system can 
be expected to be more accurate than the compara¬ 
ble CONLL systems because our task involves pre¬ 
dicting semantic roles with less fine-grained distinc¬ 
tions. Our English and Japanese results were higher 
than the CONLL 2009 SRL systems. Our English F- 
scores range from 76.3% (spoken) to 89.9% (News): 
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the best CONLL 2009 English scores were 73.31% 
(Brown) and 85.63% (WSJ). Our Japanese system 
scored 90.6%: the best CONLL 2009 Japanese score 
was 78.35%. Our Chinese system 74.5%, 4 points 
lower than the best CONLL 2009 system (78.6%), 
probably due to our system's failings, rather than the 
complexity of the task; (2) Each of the languages 
in our system uses the same linguistic framework, 
using the same types of relations, same analyses of 
comparable constructions, etc. In one case, this re¬ 
quired a conversion from a different framework to 
our own. In contrast, the 2009 CONLL task puts 
several different frameworks into one compatible in¬ 
put format. (3) The logical structures produced by 
our system typically connect all the words in the sen¬ 
tence. While this is true for some of the CONLL 
2009 languages, e.g., Czech, it is not true about 
all the languages. In particular, the CONLL 2009 
English and Chinese logical structures only include 
noun and verb predicates. 

In this paper, we will describe the GLARE frame¬ 
work (Grammatical and Logical Representation 
Lramework) and a system for producing GLARE 
output (Meyers et ah, 2001; Meyers, 2008). GLARE 
provides a logical structure for English, Chinese and 
Japanese with an L-score that is within a few per¬ 
centage points of the best parsing results for that 
language. Like LLG’s (LEG) L-structure, our log¬ 
ical structure is less fine-grained than many of the 
popular semantic role labeling schemes, but also has 
two main advantages over these schemes: it is more 
reliable and it is more comprehensive in the sense 
that it covers all parts of speech and the resulting 
logical structure is a connected graph. Our approach 
has proved adequate for three genetically unrelated 
natural languages: English, Chinese and Japanese. 
It is thus a good candidate for additional languages 
with accurate parsers. 

2 The GLARF framework 

Our system creates a multi-tiered representation in 
the GLARE framework, combining the theory un¬ 
derlying the Penn Treebank for English (Marcus et 
al., 1994) and Chinese (Xue et al., 2005) (Chom- 
skian linguistics of the 1970s and 1980s) with: (2) 
Relational Grammar’s graph-based way of repre¬ 
senting “levels” as sequences of relations; (2) Lea- 


ture structures in the style of Head-Driven Phrase 
Structure Grammar; and (3) The Z. Harris style goal 
of attempting to regularize multiple ways of saying 
the same thing into a single representation. Our 
approach differs from LEG L-structure in several 
ways: we have more than two levels; we have a 
different set of relational labels; and finally, our ap¬ 
proach is designed to be compatible with the Penn 
Treebank framework and therefore, Penn-Treebank- 
based parsers. In addition, the expansion of our the¬ 
ory is governed more by available resources than by 
the underlying theory. As our main goal is to use 
our system to regularize data, we freely incorporate 
any analysis that fits this goal. Over time, we have 
found ways of incorporating Named Entities, Prop- 
Bank, NomBank and the Penn Discourse Treebank. 
Our agenda also includes incorporating the results of 
other research efforts (Pustejovsky et ah, 2005). 

Lor each sentence, we generate a feature structure 
(PS) representing our most complete analysis. We 
distill a subset of this information into a dependency 
structure governed by theoretical assumptions, e.g., 
about identifying functors of phrases. Each GLARE 
dependency is between a functor and an argument, 
where the functor is the head of a phrase, conjunc¬ 
tion, complementizer, or other function word. We 
have built applications that use each of these two 
representations, e.g., the dependency representation 
is used in (Shinyama, 2007) and the PS represen¬ 
tation is used in (K. Parton and K. R. McKeown 
and R. Coyne and M. Diab and R. Grishman and 
D. Hakkani-Tur and M. Harper and H. Ji and W. Y. 
Ma and A. Meyers and S. Stolbach and A. Sun and 
G. Tiir and W. Xu and S. Yarman, 2009). 

In the dependency representation, each sentence 
is a set of 23 tuples, each 23-tuple characterizing up 
to three relations between two words: (1) a SUR- 
PACE relation, the relation between a functor and an 
argument in the parse of a sentence; (2) a LOGIC 1 
relation which regularizes for lexical and syntac¬ 
tic phenomena like passive, relative clauses, deleted 
subjects; and (3) a LOGIC2 relation corresponding 
to relations in PropBank, NomBank, and the Penn 
Discourse Treebank (PDTB). While the full output 
has all this information, we will limit this paper to 
a discussion of the LOGIC 1 relations. Pigure 1 is 
a 5 tuple subset of the 23 tuple GLARE analysis of 
the sentence Who was eaten by Grendel? (The full 
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Figure 1: 5-tuples: Who was eaten by Grendel 
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Figure 2: Graph of Who was eaten by Grendel 

23 tuples include unique ids and fine-grained lin¬ 
guistic features). The fields listed are: logic 1 label 
(LI), surface label (Surf), logic2 label (L2), func¬ 
tor (Func) and argument (Arg). NIL indicates that 
there is no relation of that type. Figure 2 repre¬ 
sents this as a graph. For edges with two labels, 
the ARGO or ARG1 label indicates a LOGIC2 re¬ 
lation. Edges with an L- prefix are LOGIC 1 la¬ 
bels (the edges are curved); edges with S-prefixes 
are SURFACE relations (the edges are dashed); and 
other (thick) edges bear unprefixed labels represent¬ 
ing combined SURFACE/LOGIC 1 relations. Delet¬ 
ing the dashed edges yields a LOGIC 1 representa¬ 
tion; deleting the curved edges yields a SURFACE 
representation; and a LOGIC2 consists of the edges 
labeled ARGO and ARG1 relations, plus the sur¬ 
face subtrees rooted where the LOGIC2 edges ter¬ 
minate. Taken together, a sentence’s SURFACE re¬ 
lations form a tree; the LOGIC 1 relations form a 
directed acyclic graph; and the LOGIC2 relations 
form directed graphs with some cycles and, due to 
PDTB relations, may connect sentences to previous 
ones, e.g., adverbs like however, take the previous 
sentence as one of their arguments. 

LOGIC 1 relations (based on Relational Gram¬ 
mar) regularize across grammatical and lexical al¬ 


ternations. For example, subcategorized verbal ar¬ 
guments include: SBJect, OBJect and IND-OBJ (in¬ 
direct Object), COMPlement, PRT (Particle), PRD 
(predicative complement). Other verbal modifiers 
include AUXilliary, PARENthetical, ADVerbial. In 
contrast, FrameNet and PropBank make finer dis¬ 
tinctions. Both PP arguments of consulted in John 
consulted with Mary about the project bear COMP 
relations with the verb in GLARE but would have 
distinct labels in both PropBank and FrameNet. 
Thus Semantic Role Labeling (SRL) should be more 
difficult than recognizing LOGIC 1 relations. 

Beginning with Penn Treebank II, Penn Treebank 
annotation includes Function tags, hyphenated addi¬ 
tions to phrasal categories which indicate their func¬ 
tion. There arc several types of function tags: 

• Argument Tags such as SBJ, OBJ, 10 (IND- 
OBJ), CLR (COMP) and PRD-These are lim¬ 
ited to verbal relations and not all arc used in 
all treebanks. For example, OBJ and 10 are 
used in the Chinese, but not the English tree- 
bank. These labels can often be directly trans¬ 
lated into GLARF LOGIC 1 relations. 

• Adjunct Tags such as ADV, TMP, DIR, LOC, 
MNR, PRP-These tags often translate into a 
single LOGIC 1 tag (ADV). However, some of 
these also correspond to LOGIC 1 arguments. 
In particular - , some DIR and MNR tags are re¬ 
alized as LOGIC 1 COMP relations (based on 
dictionary entries). The tine grained seman¬ 
tic distinctions are maintained in other features 
that are part of the GLARF description. 

In addition, GLARF treats Penn’s PRN phrasal 
category as a relation rather than a phrasal category. 
For example, given a sentence like. Banana ketchup, 
the agency claims, is very nutritious, the phrase 
the agency claims is analyzed as an S(entence) in 
GLARF bearing a (surface) PAREN relation to the 
main clause. Furthermore, the whole sentence is a 
COMP of the verb claims. Since PAREN is a SUR¬ 
FACE relation, not a LOGIC 1 relation, there is no 
LOGIC 1 cycle as shown by the set of 5-tuples in 
Figure 3- a cycle only exists if you include both 
SURFACE and LOGIC 1 relations in a single graph. 

Another important feature of the GLARF frame¬ 
work is transparency, a term originating from N. 
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Figure 3: 5-tuples: Banana Ketchup, the agency claims, 
is very nutritious 
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Figure 4: 5-tuples: John and Mary ate the box of cookies 

Sager’s unpublished work. A relation between two 
words is transparent if: the functor fails to character¬ 
ize the selectional properties of the phrase (or sub¬ 
graph in a Dependency Analysis), but its argument 
does. For example, relations between conjunctions 
(e.g., and, or, but ) and their conjuncts are transparent 
CONJ relations. Thus although and links together 
John and Mary, it is these dependents that deter¬ 
mine that the resulting phrase is noun-like (an NP 
in phrase structure terminology) and sentient (and 
thus can occur as the subject of verbs like ate). An¬ 
other common example of transparent relations arc 
the relations connecting certain nouns and the prepo¬ 
sitional objects under them, e.g., the box of cookies 
is edible, because cookies arc edible even though 
boxes arc not. These features arc marked in the 
NOMLEX-PLUS dictionary (Meyers et al., 2004b). 
In Figure 4, we represent transparent relations, by 
prefixing the LOGIC 1 label with asterisks. 

The above description most accurately describes 
English GLARE However, Chinese GLARF has 
most of the same properties, the main exception be¬ 
ing that PDTB arguments are not currently marked. 


For Japanese, we have only a preliminary represen¬ 
tation of LOGIC2 relations and they are not derived 
from PropBank/NomBank/PDTB. 

2.1 Scoring the LOGIC1 Structure 

For purposes of scoring, we chose to focus on 
LOGIC 1 relations, our proposed high-performance 
level of semantics. We scored with respect to: the 
LOGIC 1 relational label, the identity of the functor 
and the argument, and whether the relation is trans¬ 
parent or not. If the system output differs in any of 
these respects, the relation is marked wrong. The 
following sections will briefly describe each system 
and present an evaluation of its results. 

The answer keys for each language were created 
by native speakers editing system output, as repre¬ 
sented similarly to the examples in this paper, al¬ 
though part of speech is included for added clar¬ 
ity. In addition, as we attempted to evaluate logi¬ 
cal relation (or dependency) accuracy independent 
of sentence splitting. We obtained sentence divi¬ 
sions from data providers and treebank annotation 
for all the Japanese and most of the English data, but 
used automatic sentence divisions for the English 
BLOG data. For the Chinese, we omitted several 
sentences from our evaluation set due to incorrect 
sentence splits. The English and Japanese answer 
keys were annotated by single native speakers ex¬ 
pert in GLARF. The Chinese data was annotated by 
several native speakers and may have been subject 
to some interannotator agreement difficulties, which 
we intend to resolve in future work. Currently, cor¬ 
recting system output is the best way to create an¬ 
swer keys due to certain ambiguities in the frame¬ 
work, some of which we hope to incorporate into fu¬ 
ture scoring procedures. For example, consider the 
interpretation of the phrase //Ve acres of land in Eng¬ 
land with respect to PP attachment. The difference 
in meaning between attaching the PP in England 
to acres or to land is too subtle for these authors- 
we have difficulty imagining situations where one 
statement would be accurate and the other would 
not. This ambiguity is completely predictable be¬ 
cause acres is a transparent noun and similar ambi¬ 
guities hold for all such cases where a transparent 
noun takes a complement and is followed by a PP 
attachment. We believe that a more complex scor¬ 
ing program could account for most of these cases. 
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Si mi lar complexities arise for coordination and sev¬ 
eral other phenomena. 

3 English GLARF 

We generate English GLARF output by applying a 
procedure that combines: 

1. The output of the 2005 version of the Charniak 
parser described in (Charniak, 2001), which 
label precision and recall scores in the 85% 
range. The updated version of the parser seems 
to perform closer to 90% on News data and per¬ 
form lower on other genres. That performance 
would reflect reports on other versions of the 
Charniak parser for which statistics arc avail¬ 
able (Foster and van Genabith, 2008). 

2. Named entity (NE) tags from the JET NE sys¬ 
tem (Ji and Grishman, 2006), which achieves 
F-scores ranging 86%-91% on newswire for 
both English and Chinese (depending on 
Epoch). The JET system identifies seven 
classes of NEs: Person, GPE, Location, Orga¬ 
nization, Facility, Weapon and Vehicle. 

3. Machine Readable dictionaries: COMLEX 
(Macleod et ah, 1998), NOMBANK dictio¬ 
naries (from http://nlp.cs.nyu.edu/ 
meyers/nombank/) and others. 

4. A sequence of hand-written rules (citations 
omitted) such that: (1) the first set of rules con¬ 
vert the Penn Treebank into a Feature Structure 
representation; and (2) each rule N after the 
first rule is applied to an entire Feature Struc¬ 
ture that is the output of rule N — 1. 

For this paper, we evaluated the English output for 
several different genres, all of which approximately 
track parsing results for that genre. For written 
genres, we chose between 40 and 50 sentences. 
For speech transcripts, we chose 100 sentences-we 
chose this larger number because a lot of so-called 
sentences contained text with empty logical de¬ 
scriptions, e.g., single word utterances contain no 
relations between pairs of words. Each text comes 
from a different genre. For NEWS text, we used 50 
sentences from the aligned Japanese-English data 
created as paid of the JENAAD corpus (Utiyama 
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Table 1: English Aggregate Scores 


Corpus 

Prec 

Rec 
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Sents 

NEWS 

90.5% 

90.8% 

90.6% 

50 

BLOG 

84.1% 

79.6% 

81.7% 

46 

LETT 

93.9% 

89.2% 

91.4% 

46 

TELE 

81.4% 

83.2% 

84.9% 

103 

NARR 

77.1% 

78.1% 

79.5% 

100 


Table 2: English Score per Sentence 


and I Sahara. 2003); the web text (BLOGs) was 
taken from some corpora provided by the Linguistic 
Data Consortium through the GALE (http: 
//projects . ldc . upenn . edu/gale/) pro¬ 
gram; the LETTer genre (a letter from Good Will) 
was taken from the ICIC Corpus of Fundraising 
Texts (Indiana Center for Intercultural Communi¬ 
cation); Finally, we chose two spoken language 
transcripts: a TELEphone conversation from 

the Switchboard Corpus (http://www.ldc. 
upenn.edu/Catalog/readme_flies/ 
switchboard.readme.html) and one NAR- 
Rative from the Charlotte Narrative and Conversa¬ 
tion Collection (http : //newsouthvoices . 
uncc.edu/cncc.php). In both cases, we 
assumed perfect sentence splitting (based on Penn 
Treebank annotation). The ICIC, Switchboard 
and Charlotte texts that we used are paid of the 
Open American National Corpus (OANC), in 
particular, the SIGANN shared subcorpus of the 
OANC (http://nip.cs.nyu.edu/wiki/ 
corpuswg/ULA-OANC- 1) (Meyers et ah, 2007). 

Comparable work for English includes: (1) (Gab¬ 
bard et ah, 2006), a system which reproduces the 
function tags of the Penn Treebank with 89% accu¬ 
racy and empty categories (and their antecedents) 
with varying accuracies ranging from 82.2% to 
96.3%, excluding null complementizers, as these are 
theory-internal and have no value for filling gaps. 
(2) Current systems that generate LFG F-structure 
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such as (Wagner et al., 2007) which achieve an F 
score of 91.1 on the F-structure PRED relations, 
which are similar to our LOGIC 1 relations. 

4 Chinese GLARF 

The Chinese GLARF program takes a Chinese 
Treebank-style syntactic parse and the output of a 
Chinese PropBanker (Xue, 2008) as input, and at¬ 
tempts to determine the relations between the head 
and its dependents within each constituent. It does 
this by first exploiting the structural information 
and detecting six broad categories of syntactic rela¬ 
tions that hold between the head and its dependents. 
These arc predication , modification , complementa¬ 
tion , coordination , auxiliary, and flat. Predication 
holds at the clause level between the subject and the 
predicate, where the predicate is considered to be 
the head and the subject is considered to the depen¬ 
dent. Modification can also hold mainly within NPs 
and VPs, where the dependents arc modifiers of the 
NP head or adjuncts to the head verb. Coordination 
holds almost for all phrasal categories where each 
non-punctuation child within this constituent is ei¬ 
ther conjunction or a conjunct. The head in a co¬ 
ordination structure is underspecified and can be ei¬ 
ther a conjunct or a conjunction depending on the 
grammatical framework. Complementation holds 
between a head and its complement, with the com¬ 
plement usually being a core argument of the head. 
For example, inside a PP, the preposition is the head 
and the phrase or clause it takes is the dependent. An 
auxiliary structure is one where the auxiliary takes 
a VP as its complement. This structure is identi¬ 
fied so that the auxiliary and the verb it modifies can 
form a verb group in the GLARF framework. Flat 
structures arc structures where a constituent has no 
meaningful internal structure, which is possible in a 
small number of cases. After these six broad cate¬ 
gories of relations arc identified, more fine-grained 
relation can be detected with additional information. 
Figure 5 is a sample 4-tuple for a Chinese translation 
of the sentence in figure 3. 

For the results reported in Table 3, we used the 
Harper and Huang parser described in (Harper and 
Huang, Forthcoming) which can achieve F-scores 
as high as 85.2%, in combination with informa¬ 
tion about named entities from the output of the 
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Figure 5: Agency claims. Banana Ketchup is very have 
nutrition DE. 

JET Named Entity tagger for Chinese (86%-91% L- 
measure as per section 3). We used the NE tags to 
adjust the parts of speech and the phrasal boundaries 
of named entities (we do the same with English). 
As shown in Table 3, we tried two versions of the 
Harper and Huang parser, one which adds function 
tags to the output and one that does not. The Chinese 
GLARF system scores significantly (13.9% F-score) 
higher given function tagged input, than parser out¬ 
put without function tags. Our current score is about 
10 points lower than the parser score. Our initial er¬ 
ror analysis suggests that the most common forms 
of errors involve: (1) the processing of long NPs; 
(2) segmentation and POS errors; (3) conjunction 
scope; and (4) modifier attachment. 

5 Japanese GLARF 

For Japanese, we process text with the KNP parser 
(Kurohashi and Nagao, 1998) and convert the output 
into the GLARF framework. The KNP/Kyoto Cor¬ 
pus framework is a Japanese-specific Dependency 
framework, very different from the Penn Treebank 
framework used for the other systems. Process¬ 
ing in Japanese proceeds as follows: (1) we pro¬ 
cess the Japanese with the Juman segmenter (Kuro- 
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No Function Tags Version 
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Table 3: 53 Chinese Newswire Sentences: Aggregate and 
Average Sentence Scores 


hashi et al., 1994) and KNP parser 2.0 (Kurohashi 
and Nagao, 1998), which has reported accuracy of 
91.32% F score for dependency accuracy, as re¬ 
ported in (Noro et ah, 2005). As is standard in 
Japanese linguistics, the KNP/Kyoto Corpus (K) 
framework uses a dependency analysis that has some 
features of a phrase structure analysis. In partic¬ 
ular, the dependency relations are between bun- 
setsu, small constituents which include a head word 
and some number of modifiers which are typically 
function words (particles, auxiliaries, etc.), but can 
also be prenominal noun modifiers. Bunsetsu can 
also include multiple words in the case of names. 
The K framework differentiates types of dependen¬ 
cies into: the normal head-argument variety, coor¬ 
dination (or parallel) and apposition. We convert 
the head-argument variety of dependency straight¬ 
forwardly into a phrase consisting of the head and 
all the arguments. In a similar way, appositive re¬ 
lations could be represented using an APPOSITIVE 
relation (as is currently done with English). In the 
case of bunsetsu, the task is to choose a head and 
label the other constituents-This is very similar to 
our task of labeling and subdividing the flat noun 
phrases of the English Penn Treebank. Conjunction 
is a little different because the K analysis assumes 
that the final conjunct is the functor, rather than a 
conjunction. We automatically changed this analy¬ 
sis to be the same as it is for English and Chinese. 
When there was no actual conjunction, we created a 
theory-internal NULL conjunction. The final stages 
include: ( 1 ) processing conjunction and apposition, 
including recognizing cases that the parser does not 
recognize; (2) correcting parts of speech; (3) label¬ 
ing all relations between arguments and heads; (4) 
recognizing and labeling special constituent types 
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Figure 6: It is the state’s duty to protect lives and assets. 
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Table 4: 40 Japanese Sentences from JENAA Corpus: 
Aggregate and Average Sentence Scores 

such as Named Entities, double quote constituents 
and number phrases (twenty one)', (5) handling com¬ 
mon idioms; and ( 6 ) processing light verb and cop¬ 
ula constructions. 

Figure 6 is a sample 4-tuple for a Japanese 
sentence meaning It is the state’s duty to protect 
lives and assets. Conjunction is handled as dis¬ 
cussed above, using an invisible NULL conjunction 
and transparent (asterisked) logical CONJ relations. 
Copulas in all three languages take surface subjects, 
which are the LOGIC 1 subjects of the PRD argu¬ 
ment of the copula. We have left out glosses for the 
particles, which act solely as case markers and help 
us identify the grammatical relation. 

We scored Japanese GLARF on forty sentences of 
the Japanese side of the JENAA data (25 of which 
arc parallel with the English sentences scored). Like 
the English, the F score is very close to the parsing 
scores achieved by the parser. 
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6 Concluding Remarks and Future Work 

In this paper, we have described three systems 
for generating GLARF representations automati¬ 
cally from text, each system combines the out¬ 
put of a parser and possibly some other processor 
(segmenter, Named Entity Recognizer, PropB anker, 
etc.) and creates a logical representation of the sen¬ 
tence. Dictionaries, word lists, and various other 
resources arc used, in conjunction with hand writ¬ 
ten rules. In each case, the results arc very close to 
parsing accuracy. These logical structures arc in the 
same annotation framework, using the same labeling 
scheme and the same analysis for key types of con¬ 
structions. There arc several advantages to our ap¬ 
proach over other characterizations of logical struc¬ 
ture: (1) our representation is among the most accu¬ 
rate and reliable; (2) our representation connects all 
the words in the sentence; and (3) having the same 
representation for multiple languages facilitates run¬ 
ning the same procedures in multiple languages and 
creating multilingual applications. 

The English system was developed for the News 
genre, specifically the Penn Treebank Wall Street 
Journal Corpus. We are therefore considering 
adding rules to better handle constructions that ap¬ 
peal - in other genres, but not news. The experi¬ 
ments describe here should go a long way towards 
achieving this goal. We are also considering ex¬ 
periments with parsers tailored to particular genres 
and/or parsers that add function tags (Harper et ah, 
2005). In addition, our current GLARF system uses 
internal Propbank/NomBank rules, which have good 
precision, but low recall. We expect that we achieve 
better results if we incorporate the output of state 
of the art SRL systems, although we would have to 
conduct experiments as to whether or not we can im¬ 
prove such results with additional rules. 

We developed the English system over the course 
of eight years or so. In contrast, the Chinese and 
Japanese systems are newer and considerably less 
time was spent developing them. Thus they cur¬ 
rently do not represent as many regularizations. One 
obstacle is that we do not currently use subcate¬ 
gorization dictionaries for either language, while 
we have several for English. In particular - , these 
would be helpful in predicting and filling relative 
clause and others gaps. We are considering auto¬ 


matically acquiring simple dictionaries by recording 
frequently occurring argument types of verbs over 
a larger corpus, e.g., along the lines of (Kawahara 
and Kurohashi, 2002). In addition, existing Japanese 
dictionaries such as the IPAL (monolingual) dictio¬ 
nary (technology Promotion Agency, 1987) or previ¬ 
ously acquired case information reported in (Kawa¬ 
hara and Kurohashi, 2002). 

Finally, we are investigating several avenues for 
using this system output for Machine Translation 
(MT) including: (1) aiding word alignment for other 
MT system (Wang et ah, 2007); and (2) aiding the 
creation various MT models involving analyzed text, 
e.g., (Gildea, 2004; Shen et al., 2008). 

Acknowledgments 

This work was supported by NSF Grant IIS- 
0534700 Structure Alignment-based MT. 

References 

C. F. Baker, C. J. Fillmore, and J. B. Lowe. 1998. The 
Berkeley FrameNet Project. In Coling-ACL98 , pages 
86-90. 

E. Charniak. 2001. Immediate-head parsing for language 
models. In ACL 2001, pages 116-123. 

N. Chomsky. 1957. Syntactic Structures. Mouton, The 
Hague. 

M. Collins. 1999. Head-Driven Statistical Models for 
Natural Language Parsing. Ph.D. thesis. University 
of Pennsylvania. 

J. Foster and J. van Genabith. 2008. Parser Evaluation 
and the BNC: 4 Parsers and 3 Evaluation Metrics. In 
LREC 2008, Marrakech, Morocco. 

R. Gabbard, M. Marcus, and S. Kulick. 2006. Fully pars¬ 
ing the penn treebank. In NAACUHLT, pages 184— 
191. 

D. Gildea. 2004. Dependencies vs. Constituents for 
Tree-Based Alignment. In EMNLP, Barcelona. 

J. Hajic, M. Ciaramita, R. Johansson, D. Kawahara, 
M. A. Marti, L. Marquez, A. Meyers, J. Nivre, S. Pado, 
J. Stepanek, P. Stranak, M. Surdeanu, N. Xue, and 
Y. Zhang. 2009. The CoNLL-2009 shared task: 
Syntactic and semantic dependencies in multiple lan¬ 
guages. In CoNLL-2009 , Boulder, Colorado, USA. 

M. Harper and Z. Huang. Forthcoming. Chinese Statis¬ 
tical Parsing. In J. Olive, editor. Global Autonomous 
Language Exploitation. Publisher to be Announced. 
M. Harper, B. Dorr, J. Hale, B. Roark, I. Shafran, 
M. Lease, Y. Liu, M. Snover, L. Yung, A. Krasnyan- 
skaya, and R. Stewart. 2005. Parsing and Spoken 


153 



Structural Event. Technical Report, The John-Hopkins 
University, 2005 Summer Research Workshop. 

Z. Harris. 1968. Mathematical Structures of Language. 
Wiley-Interscience, New York. 

J. R. Hobbs and R. Grishman. 1976. The Automatic 
Transformational Analysis of English Sentences: An 
Implementation. International Journal of Computer 
Mathematics , 5:267-283. 

H. Ji and R. Grishman. 2006. Analysis and Repair of 
Name Tagger Errors. In COLING/ACL 2006. Sydney, 
Australia. 

K. Parton and K. R. McKeown and R. Coyne and M. Diab 
and R. Grishman and D. Hakkani-Tiir and M. Harper 
and H. Ji and W. Y. Ma and A. Meyers and S. Stol- 
bach and A. Sun and G. Tiir and W. Xu and S. Yarman. 
2009. Who, What, When, Where, Why? Comparing 
Multiple Approaches to the Cross-Lingual 5W Task. 
In ACL 2009. 

D. Kawahara and S. Kurohashi. 2002. Fertilization 
of Case Frame Dictionary for Robust Japanese Case 
Analysis. In l'roc. of COLING 2002. 

S. Kurohashi and M. Nagao. 1998. Building a Japanese 
parsed corpus while improving the parsing system. In 
Proceedings of The 1st International Conference on 
Language Resources & Evaluation , pages 719-724. 

S. Kurohashi, T. Nakamura, Y. Matsumoto, and M. Na¬ 
gao. 1994. Improvements of Japanese Morpholog¬ 
ical Analyzer JUMAN. In Proc. of International 
Workshop on Sharable Natural Language Resources 
(SNLR). pages 22-28. 

C. Macleod, R. Grishman, and A. Meyers. 1998. COM- 
LEX Syntax. Computers and the Humanities, 31:459— 
481. 

M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 
1994. Building a large annotated corpus of en- 
glish: The penn treebank. Computational Linguistics, 
19(2):313-330. 

A. Meyers, M. Kosaka, S. Sekine, R. Grishman, and 
S. Zhao. 2001. Parsing and GLARFing. In Proceed¬ 
ings of RANLP-2001 , Tzigov Chark, Bulgaria. 

A. Meyers, R. Reeves, C. Macleod, R. Szekely, V. Zielin- 
ska, B. Young, and R. Grishman. 2004a. The Nom- 
Bank Project: An Interim Report. In NAACUHLT 
2004 Workshop Frontiers in Corpus Annotation, 
Boston. 

A. Meyers, R. Reeves, Catherine Macleod, Rachel Szeke- 
ley, Veronkia Zielinska, and Brian Young. 2004b. The 
Cross-Breeding of Dictionaries. In Proceedings of 
LREC-2004, Lisbon, Portugal. 

A. Meyers, N. Ide, L. Denoyer, and Y. Shinyama. 2007. 
The shared corpora working group report. In Pro¬ 
ceedings of The Linguistic Annotation Workshop, ACL 
2007, pages 184-190, Prague, Czech Republic. 


A. Meyers. 2008. Using treebank, dictionaries and 
glarf to improve nombank annotation. In Proceedings 
of The Linguistic Annotation Workshop, LREC 2008, 
Marrakesh, Morocco. 

E. Miltsakaki, A. Joshi, R. Prasad, and B. Webber. 2004. 
Annotating discourse connectives and their arguments. 
In A. Meyers, editor, NAACUHLT 2004 Workshop: 
Frontiers in Corpus Annotation, pages 9-16, Boston, 
Massachusetts, USA, May 2 - May 7. Association for 
Computational Linguistics. 

T. Noro, C. Koike, T. Hashimoto, T. Tokunaga, and 
Hozumi Tanaka. 2005. Evaluation of a Japanese CFG 
Derived from a Syntactically Annotated corpus with 
Respect to Dependency Measures. In 2005 Workshop 
on Treebanks and Linguistic theories, pages 115-126. 

M. Palmer, D. Gildea, and P. Kingsbury. 2005. The 
Proposition Bank: An annotated corpus of semantic 
roles. Computational Linguistics, 31(1):71—106. 

J. Pustejovsky, A. Meyers, M. Palmer, and M. Poe- 
sio. 2005. Merging PropBank, NomBank, TimeBank, 
Penn Discourse Treebank and Coreference. In ACL 
2005 Workshop: Frontiers in Corpus Annotation II: 
Pie in the Sky. 

L. Shen, J. Xu, and R. Weischedel. 2008. A New String- 
to-Dependency Machine Translation Algorithm with a 
Target Dependency Language Model. In ACL 2008. 

Y. Shinyama. 2007. Being Lazy and Preemptive at 
Learning toward Information Extraction. Ph.D. the¬ 
sis, NYU. 

M. Surdeanu, R. Johansson, A. Meyers, LI. Marquez, 
and J. Nivre. 2008. The CoNLL-2008 Shared Task 
on Joint Parsing of Syntactic and Semantic Dependen¬ 
cies. In Proceedings of the CoNLL-2008 Shared Task, 
Manchester, GB. 

Information technology Promotion Agency. 1987. IPA 
Lexicon of the Japanese Language for Computers 
IPAL (Basic Verbs), (in Japanese). 

M. Utiyama and H. Isahara. 2003. Reliable Measures 
for Aligning Japanese-English News Articles and Sen¬ 
tences. In ACL-2003, pages 72-79. 

J. Wagner, D. Seddah, J. Foster, and J. van Genabith. 
2007. C-Structures and F-Structures for the British 
National Corpus. In Proceedings of the Twelfth In¬ 
ternational Lexical Functional Grammar Conference, 
Stanford. CSLI Publications. 

C. Wang, M. Collins, and P. Koehn. 2007. Chinese syn¬ 
tactic reordering for statistical machine translation. In 
EMNLP-CoNLL 2007, pages 737-745. 

N. Xue, F. Xia, F. Chiou, and M. Palmer. 2005. The 
Penn Chinese TreeBank: Phrase Structure Annotation 
of a Large Corpus. Natural Language Engineering, 
11:207-238. 

N. Xue. 2008. Labeling Chinese Predicates with Seman¬ 
tic roles. Computational Linguistics, 34:225-255. 


154 



